我们考虑开放的联合学习(FL)系统,客户可以在FL过程中加入和/或离开系统。鉴于当前客户端数量的差异,在开放系统中不能保证与固定模型的收敛性。取而代之的是,我们求助于一个新的性能指标,该指标称我们的开放式FL系统的稳定性为量,该指标量化了开放系统中学习模型的幅度。在假设本地客户端的功能强烈凸出和平滑的假设下,我们从理论上量化了两种FL算法的稳定性半径,即本地SGD和本地ADAM。我们观察到此半径依赖于几个关键参数,包括功能条件号以及随机梯度的方差。通过对合成和现实世界基准数据集的数值模拟,我们的理论结果得到了进一步验证。
translated by 谷歌翻译
线性时间流(LTI)系统的识别在控制和增强学习中起重要作用。文献中都对渐近时间和有限的离线系统识别进行了充分研究。对于在线系统识别,最近提出了具有反向体验重播(SGD RER)的随机梯度下降的想法,其中数据序列存储在几个缓冲区中,随机分脱水量(SGD)更新在每个缓冲区中向后进行,以使每个缓冲区向后进行。打破数据点之间的时间依赖关系。在这项工作的启发下,我们研究了通过多代理网络分布LTI系统的在线系统识别。我们将代理视为相同的LTI系统,网络目标是通过利用代理之间的通信共同估计系统参数。我们提出了DSGD-RER,SGD-RER算法的分布式变体,理论上表征了相对于网络大小的估计误差的改善。随着网络大小的增长,我们的数值实验证明了估计误差的减少。
translated by 谷歌翻译
深度高斯进程(DGP)使非参数方法能够量化复杂深机器学习模型的不确定性。 DGP模型的传统推理方法可以遭受高计算复杂性,因为它们需要使用核矩阵的大规模操作进行训练和推理。在这项工作中,我们提出了一种基于一系列高斯过程的准确推理和预测的有效方案,称为Tensor Markov高斯过程(TMGP)。我们构建称为分层扩展的TMGP的诱导近似。接下来,我们开发一个深入的TMGP(DTMGP)模型作为TMGPS的多个层次扩展的组成。所提出的DTMGP模型具有以下性质:(1)每个激活功能的输出是确定性的,而重量独立于标准高斯分布选择; (2)在训练或预测中,只有O(Polylog(M))(M)激活函数具有非零输出,这显着提高了计算效率。我们对实时数据集的数值实验显示了DTMGP与其他DGP型号的卓越计算效率。
translated by 谷歌翻译
在包括生成建模的各种机器学习应用中的两个概率措施中,已经证明了切片分歧的想法是成功的,并且包括计算两种测量的一维随机投影之间的“基地分歧”的预期值。然而,这种技术的拓扑,统计和计算后果尚未完整地确定。在本文中,我们的目标是弥合这种差距并导出切片概率分歧的各种理论特性。首先,我们表明切片保留了公制公理和分歧的弱连续性,这意味着切片分歧将共享相似的拓扑性质。然后,我们在基本发散属于积分概率度量类别的情况下精确结果。另一方面,我们在轻度条件下建立了切片分歧的样本复杂性并不依赖于问题尺寸。我们终于将一般结果应用于几个基地分歧,并说明了我们对合成和实际数据实验的理论。
translated by 谷歌翻译
随机特征方法已广泛用于大型机器学习中的内核近似。最近的一些研究已经探索了数据相关的功能,修改随机特征的随机oracle进行采样。虽然该领域的提出技术提高了近似值,但它们通常在单个学习任务上验证它们的适用性。在本文中,我们提出了一种特定于任务的评分规则,用于选择随机特征,该规则可以用于不同的应用程序具有一些调整。我们限制了我们对规范相关性分析(CCA)的注意,我们提供了一种新颖的,原则性指南,用于找到最大化规范相关性的得分函数。我们证明了这种方法,称为ORCCA,可以胜过(期望)具有默认内核的相应内核CCA。数值实验验证ORCCA明显优于CCA任务中的其他近似技术。
translated by 谷歌翻译
Charisma is considered as one's ability to attract and potentially also influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. A number of models exist that base charisma on various dimensions, often following the idea that charisma is given if someone could and would help others. Examples include influence (could help) and affability (would help) in scientific studies or power (could help), presence, and warmth (both would help) as a popular concept. Modelling high levels in these dimensions for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we, thereforem present a blueprint for building machines that can appear charismatic, but also analyse the charisma of others. To this end, we first provide the psychological perspective including different models of charisma and behavioural cues of it. We then switch to conversational charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behaviour by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.
translated by 谷歌翻译
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modelling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no labelled benchmark for this task. We address this gap by introducing continuous valence and arousal annotations for an existing dataset of children's stories annotated with discrete emotion categories. We collect additional annotations for this data and map the originally categorical labels to the valence and arousal space. Leveraging recent advances in Natural Language Processing, we propose a set of novel Transformer-based methods for predicting valence and arousal signals over the course of written stories. We explore several strategies for fine-tuning a pretrained ELECTRA model and study the benefits of considering a sentence's context when inferring its emotionality. Moreover, we experiment with additional LSTM and Transformer layers. The best configuration achieves a Concordance Correlation Coefficient (CCC) of .7338 for valence and .6302 for arousal on the test set, demonstrating the suitability of our proposed approach. Our code and additional annotations are made available at https://github.com/lc0197/emotion_modelling_stories.
translated by 谷歌翻译
Automatic video captioning aims for a holistic visual scene understanding. It requires a mechanism for capturing temporal context in video frames and the ability to comprehend the actions and associations of objects in a given timeframe. Such a system should additionally learn to abstract video sequences into sensible representations as well as to generate natural written language. While the majority of captioning models focus solely on the visual inputs, little attention has been paid to the audiovisual modality. To tackle this issue, we propose a novel two-fold approach. First, we implement a reward-guided KL Divergence to train a video captioning model which is resilient towards token permutations. Second, we utilise a Bi-Modal Hierarchical Reinforcement Learning (BMHRL) Transformer architecture to capture long-term temporal dependencies of the input data as a foundation for our hierarchical captioning module. Using our BMHRL, we show the suitability of the HRL agent in the generation of content-complete and grammatically sound sentences by achieving $4.91$, $2.23$, and $10.80$ in BLEU3, BLEU4, and METEOR scores, respectively on the ActivityNet Captions dataset. Finally, we make our BMHRL framework and trained models publicly available for users and developers at https://github.com/d-rothen/bmhrl.
translated by 谷歌翻译
One of the major challenges in acoustic modelling of child speech is the rapid changes that occur in the children's articulators as they grow up, their differing growth rates and the subsequent high variability in the same age group. These high acoustic variations along with the scarcity of child speech corpora have impeded the development of a reliable speech recognition system for children. In this paper, a speaker- and age-invariant training approach based on adversarial multi-task learning is proposed. The system consists of one generator shared network that learns to generate speaker- and age-invariant features connected to three discrimination networks, for phoneme, age, and speaker. The generator network is trained to minimize the phoneme-discrimination loss and maximize the speaker- and age-discrimination losses in an adversarial multi-task learning fashion. The generator network is a Time Delay Neural Network (TDNN) architecture while the three discriminators are feed-forward networks. The system was applied to the OGI speech corpora and achieved a 13% reduction in the WER of the ASR.
translated by 谷歌翻译
幽默是人类情感和认知的重要因素。它的自动理解可以促进更自然的人类设备互动和人工智能的人性化。当前的幽默检测方法仅基于分阶段数据,使其不适用于“现实世界”应用程序。我们通过引入新颖的Passau自发足球教练幽默(Passau-SFCH)数据集来解决这种缺陷,包括大约11个小时的录音。在马丁的幽默风格问卷中提出的幽默及其尺寸(情感和方向)的存在,请注释Passau-SFCH数据集。我们进行了一系列实验,采用了经过预定的变压器,卷积神经网络和专家设计的功能。分析了每种模式(文本,音频,视频)的表现,以进行自发幽默识别,并研究了它们的互补性。我们的发现表明,对于对幽默及其情感的自动分析,面部表情是最有希望的,而幽默方向可以通过基于文本的功能进行建模。结果揭示了各种主题之间的差异,突出了幽默用法和风格的个性。此外,我们观察到决策级融合会产生最佳认可结果。最后,我们在https://www.github.com/eihw/passau-sfch上公开代码。可以根据要求获得Passau-SFCH数据集。
translated by 谷歌翻译